Question-1

Caption

Question-2

3)

Use ggplot2 and the function from step 2 to create a density plot of Infection risk in which outliers are plotted as a diamond symbol ( ◊ ) . Make some analysis of this graph. We can see from the above graph, it resembles the shape of a normal distribution which majority of the observations around 4.5.There are only a few outliers i.e 5 outliers to be exact.

4)

Produce graphs of the same kind as in step 3 but for all other quantitative variables in the data (to iterate over several column names, you can use .data pronoun, i.e. aes(x=.data[variable_that_contains_column_name_as_string], â€Ĥ) ). Put these graphs into one (hint: arrangeGrob() in gridExtra package can be used) and make some analysis. We notice outliers for all the variables except for the variable “Available Facilities & Services”(V12). We also see skewness in some variables like “Number of Beds” (V7), “Average Daily Census”(V10) and “Number of Nurses”(V11).

5)

Create a ggplot2 scatter plot showing the dependence of Infection risk on the Number of Nurses where the points are colored by Number of Beds. Is there any interesting information in this plot that was not visible in the plots in step 4? What do you think is a possible danger of having such a color scale?

We dont see anything prominent in the above graph to mention any insights.

With regards to the color scale, having such a large color scale can be overwhelming plus having different shades of the colors makes it harder to analyze the graph to easily gain insights from it.

6)

Convert graph from step 3 to Plotly with ggplotly function. What important new functionality have you obtained compared to the graph from step 3? Make some additional analysis of the new graph.

Upon using plotly, we are able to hover over the graph and actually get the corresponding values for various locations on the graph like the outliers, we can hover over those outliers to easily identify its values. Additionally, we can also zoom in to a particular part of the graph we might be interested to analyse further.

7)

Use data plot-pipeline and the pipeline operator to make a histogram of Infection risk in which outliers are plotted as a diamond symbol ( ◊ ) . Make this plot in the Plotly directly (i.e. without using ggplot2 functionality). Hint: select(), filter() and is.element() functions might be useful here.

8)

Write a Shiny app that produces the same kind of plot as in step 4 but in addition include: a. Checkboxes indicating for which variables density plots should be produced b. A slider changing the bandwidth parameter in the density estimation (‘bw’ parameter) Comment how the graphs change with varying bandwidth. Is there any bandwidth that is optimal for all of graphs?

Shiny applications not supported in static R Markdown documents

The default bandwidth value that was provided is 1. And from the graphs, we can see that as we increase this value , the smoother the graphs becomes.

Appendix

Code

knitr::opts_chunk$set(
    fig.align = "center",
    fig.height = 3,
    fig.width = 5,
    message = FALSE,
    warning = FALSE
)
set.seed(12345)
knitr::include_graphics("image-1.pdf")
##### Question-2
#1)
setwd("~/Downloads/M A S T E R S /Sem-3/Part-1/Visualization/labs/lab-1")

data = read.table("SENIC.txt",header = FALSE)

#2)
outlier <- function(variable){
  
  quants <- quantile(variable)
  
  Q1 <- quants[2]
  Q3 <- quants[4]
  
  upper <- Q3+1.5*(Q3-Q1)
  lower <- Q1-1.5*(Q3-Q1)
  
  value1 <- which(variable > upper)
  value2 <- which(variable < lower)
  
  final <- c(value1,value2)
  
  return(final)
}

#3)
library(ggplot2)

ind <- outlier(data$V4)
values <- data$V4[ind]

viz <- ggplot(data = data, aes(x = V4)) +  geom_density(kernel = "gaussian") + geom_point(data = data.frame(values), aes(x = values, y = 0),shape = 23, color = "red") + labs( x = "Infection risk", y = "Density")

viz 
#4)
library(gridExtra)

col_names <- c("V2","V3","V5","V6","V7","V10","V11","V12")

col_real <- c("Length of Stay ", "Age", "Routine Culturing Ratio", "Routine Chest X-ray Ratio", "Number of Beds ", "Average Daily Census", "Number of Nurses", "Available Facilities & Services")

final<-list()

i <- 1
for(cols in col_names){
  
  ind <- outlier(data[[cols]])
  values <- data[[cols]][ind]
  
  out <- ggplot(data = data, aes(x = .data[[cols]])) +  geom_density(kernel = "gaussian") + geom_point(data = data.frame(values), aes(x = values, y = 0),shape = 23, color = "red") + labs( x = col_real[i] , y = "Density")
  
  final[[cols]]<-out
  
  i <- i + 1

}


plots <- arrangeGrob(grobs = final, ncol = 5, nrow = 2)  

grid.arrange(plots)


#5)
ggplot(data,aes(V4,V11,color = factor(V7))) + geom_point() + labs( x = "Infection risk", y = "Number of Nurses")
#6)
library(plotly)

ggplotly(viz)

#7)
ind <- outlier(data$V4)
values <- data$V4[ind]

quants <- quantile(data$V4)

Q1 <- quants[2]
Q3 <- quants[4]
  
upper <- Q3+1.5*(Q3-Q1)
lower <- Q1-1.5*(Q3-Q1)
  


data %>% plot_ly(x= ~V4, type = "histogram") %>% select(V4) %>% filter(V4 > upper | V4 < lower) %>% add_trace(type = "scatter", mode = "markers", marker = list(symbol = "diamond")) %>%  layout(xaxis = list(title = 'Infection risk'), showlegend = FALSE)

#8)
library(shiny)


colnames(data) <- c("Identification Number","Length of Stay ", "Age","Infection Risk", "Routine Culturing Ratio", "Routine Chest X-ray Ratio", "Number of Beds ","Medical School Affiliation","Region", "Average Daily Census", "Number of Nurses", "Available Facilities & Services")

ui <- fluidPage(
  
   checkboxGroupInput("variable", "Variables to show:",c("Length of Stay ", "Age","Infection Risk", "Routine Culturing Ratio", "Routine Chest X-ray Ratio", "Number of Beds ", "Average Daily Census", "Number of Nurses", "Available Facilities & Services")),
   sliderInput("integer", "BW bandwidth:",min = 0, max = 100,value = 1), 
   
   plotOutput("plots")
   
)

# Define server actions
server <- function(input, output) {
  # Use input 
  # Save updates to output 
   
  output$plots <- renderPlot({
    
  final_2<-list()
  col_names_2 <- input$variable
  for(cols in col_names_2){
    
    ind <- outlier(data[[cols]])
    values <- data[[cols]][ind]
    
    out <- ggplot(data = data, aes(x = .data[[cols]])) +  geom_density(kernel = "gaussian",bw = input$integer) + geom_point(data = data.frame(values), aes(x = values, y = 0),shape = 23, color = "red") + labs( x = cols , y = "Density")
    
    final_2[[cols]]<-out
  
  }
  
  plots <- arrangeGrob(grobs = final_2, ncol = 5, nrow = 2)  

  #print(final_2)
  #print(input$variable)
  #grid.arrange(plots)
  plot(plots)
  #print(plots)
    
  })
  
}

# Run the application 
shinyApp(ui = ui, server = server)